Goto

Collaborating Authors

 candidate model



Supplemental Material for " Model Selection for Production System via Automated Online Experiments " A Experiment Details

Neural Information Processing Systems

We use the default setting of BO in GPyOpt, where the surrogate model is a Gaussian process (GP) regression model with a Gaussian noise distribution and a Mátern 5/2 kernel. However, for the recommender system experiment, there are no natural representations for the candidate models. Off-policy evaluation (OPE) methods can provide an estimate of the accumulative metric. IS-g and DR-g suffer from the fact that there is no exploration mechanism. We simulate the "online" deployment scenario as follows: a multi-class classifier is given a set of inputs; for each input, the classifier returns a prediction of the label and only a binary immediate feedback about whether the predicted class is correct is available.



production systems (MSPS), in which model selection is achieved by sequentially deploying a list of candidate models

Neural Information Processing Systems

We thank the reviewers for the in-depth reviews. We will first answer the comments shared by multiple reviewers and then answer the individual comments. We will add additional details about sparse GP, VI and binary observation in suppl. A/B tests for a few weeks and then select the best model, which is the common scenario in industry. Model selection for a time sensitive system is an interesting and open research question for future work.


Scalable branch-and-bound model selection with non-monotonic criteria including AIC, BIC and Mallows's $\mathit{C_p}$

Vanhoefer, Jakob, Körner, Antonia, Doresic, Domagoj, Hasenauer, Jan, Pathirana, Dilan

arXiv.org Machine Learning

Model selection is a pivotal process in the quantitative sciences, where researchers must navigate between numerous candidate models of varying complexity. Traditional information criteria, such as the corrected Akaike Information Criterion (AICc), Bayesian Information Criterion (BIC), and Mallows's $\mathit{C_p}$, are valuable tools for identifying optimal models. However, the exponential increase in candidate models with each additional model parameter renders the evaluation of these criteria for all models -- a strategy known as exhaustive, or brute-force, searches -- computationally prohibitive. Consequently, heuristic approaches like stepwise regression are commonly employed, albeit without guarantees of finding the globally-optimal model. In this study, we challenge the prevailing notion that non-monotonicity in information criteria precludes bounds on the search space. We introduce a simple but novel bound that enables the development of branch-and-bound algorithms tailored for these non-monotonic functions. We demonstrate that our approach guarantees identification of the optimal model(s) across diverse model classes, sizes, and applications, often with orders of magnitude computational speedups. For instance, in one previously-published model selection task involving $2^{32}$ (approximately 4 billion) candidate models, our method achieves a computational speedup exceeding 6,000. These findings have broad implications for the scalability and effectiveness of model selection in complex scientific domains.


HuggingR$^{4}$: A Progressive Reasoning Framework for Discovering Optimal Model Companions

Ma, Shaoyin, Song, Jie, Wang, Huiqiong, Sun, Li, Song, Mingli

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have made remarkable progress in their ability to interact with external interfaces. Selecting reasonable external interfaces has thus become a crucial step in constructing LLM agents. In contrast to invoking API tools, directly calling AI models across different modalities from the community (e.g., HuggingFace) poses challenges due to the vast scale (> 10k), metadata gaps, and unstructured descriptions. Current methods for model selection often involve incorporating entire model descriptions into prompts, resulting in prompt bloat, wastage of tokens and limited scalability. To address these issues, we propose HuggingR$^4$, a novel framework that combines Reasoning, Retrieval, Refinement, and Reflection, to efficiently select models. Specifically, We first perform multiple rounds of reasoning and retrieval to get a coarse list of candidate models. Then, we conduct fine-grained refinement by analyzing candidate model descriptions, followed by reflection to assess results and determine if retrieval scope expansion is necessary. This method reduces token consumption considerably by decoupling user query processing from complex model description handling. Through a pre-established vector database, complex model descriptions are stored externally and retrieved on-demand, allowing the LLM to concentrate on interpreting user intent while accessing only relevant candidate models without prompt bloat. In the absence of standardized benchmarks, we construct a multimodal human-annotated dataset comprising 14,399 user requests across 37 tasks and conduct a thorough evaluation. HuggingR$^4$ attains a workability rate of 92.03% and a reasonability rate of 82.46%, surpassing existing method by 26.51% and 33.25% respectively on GPT-4o-mini.



Controllable Heterogeneous Model Aggregation for Personalized Federated Learning

Neural Information Processing Systems

Several methods have emerged to aggregate diverse client models; however, they either lack the ability of personalization, raise privacy and security concerns, need prior knowledge, or ignore the capability and functionality of personalized models.